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Abstract 

We present a general protocol for detecting whether a property 
holds in a distributed system, where the property is a member of a 
subclass of stable properties we call the locally stable properties. Our 
protocol is based on a decentralized method for constructing a maximal 
subset of the local states that are.mutually consistent, which in turn is 
based on a weakened vers ion of vector time stamps. The structure of 
our protocol lends itself to refinement, and we demonstrate its utility 
by deriving some specialized property-detection protocols, including 
two previously-known protocols that are known to be efficient. 


1 Introduction 

It is conceptually simple to determine whether the global state of a dis- 
tributed system satisfies a stable property; that is, a property $ that satis- 
fies $ □ $. One can have a process use a snapshot algorithm such as the 

one given in [CL85] to collect the relevant local and channel states and then 
test to see if the condition holds over the collected state. This technique can 

'This work was supported by the Defense Advanced Research Projects Agency (DoD) 
under NASA Ames grant number NAG 2-593, and by grants from IBM and Siemens. The 
views, opinions, and findings contained in this report are those of the authors and should 
not be construed as an official Department of Defense position, policy, or decision. 
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be used to detect any stable property. However, for many stable properties 
of interest, such as deadlock, termination, and lack of a token, there exist 
specialized protocols (for example, [Mis83,Mat87,BT84,CMH83]) that are 
more efficient than a straightforward application of [CL85]. As well as being 
more efficient, many of these specialized protocols are very elegant and their 
relation to snapshots is not apparent. 

It would be useful if one could derive such special-purpose protocols 
by refinement of a general snapshot protocol. Unfortunately, the protocol 
of [CL85] was not developed with refinement in mind, and we have not 
found it conducive to such refinement. In this paper, we present a different 
protocol for detecting stable properties that has proven to be more conducive 
to refinement. 

A naive general detection protocol is as follows: every time a process 
executes an event, it appends its current state to a queue maintained in 
local memory. A separate process po periodically retrieves these queues of 
local histories and extracts from them the latest global state. Process po then 
tests to see if the property holds in this global state. Unfortunately, this 
protocol is impractical since it has a large execution overhead and requires 
unbounded local memory. This can be fixed (at a cost of generality) by 
having each process record only its current state at appropriate times and 
by having po consider some subset of these local states that could be part 
of a sensible global state of the system. Not all stable properties can be 
detected this way, but it turns out that most stable properties that have 
been discussed in the literature can. 

In this paper, we present a method to detect a subclass of stable prop- 
erties. The method can be easily expressed as a decentralized protocol and 
can be customized for different properties in order to yield efficient special- 
purpose protocols. We demonstrate its utility by using it to derive such 
protocols including two previously-known protocols that are known to be 
efficient. 


2 Definitions 


2.1 System Model 

We consider an asynchronous distributed system consisting of a set of n 
nonfaulty processes P = {pi,p 2 > ■ • mPh}* Between any two processes pi and 
p ; there exist two unidirectional nonfaulty FIFO channels: Cij from p t to p 3 
and Cjj from p 3 to p,. These channels have unbounded delivery time, and 
processes communicate only by sending and receiving messages over these 
channels. 

Processes execute events , which are partitioned into send events, receive 
events, and local events. We will denote the tt h event executed by process 
p t as e\ and the resulting local state o\. Thus, the execution of process p t 
can be denoted (cr? e\ a\ e ] oj • * •)• Note that the state reflects the 
execution of events e J through e\. When the ordinality of an event or state 
is not important, we will drop the superscript, e.g. the execution of event 
e x results in the local state <7{. 

An arbitrary collection of local states may not constitute a sensible global 
state: the local state of a process in the collection may reflect the receipt of 
a message while no process’ local state reflects the sending of that message. 
Such sets of local states are called inconsistent ; a sensible collection of local 
states is called consistent ([CL85]) 

A global state is defined to be a consistent set £ = {<7i , <7 2 , - - • , &n} of the 
processes’ local states. We assume that channel states are captured in the 
local states of the processes. There are many ways to do this, for example 
by having each process maintain a history of all messages that it sends and 
receives. In practice, one must ensure that the representation of the channel 
states does not require an unbounded number of messages to be recorded. 

A consistent cut is defined to be a set of events C — {^1 * • * > e n} 

such that the set of states {<7i, cr 2 , . . <7 n } produced by C is a global state. 
Thus, each consistent cut has a corresponding global state, and vice versa. 
In this sense, global states and consistent cuts are equivalent notions. When 
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defining properties of a distributed system, it is convenient to refer to states; 
our protocol uses events. For this reason, we define both global states and 
consistent cuts for use in different contexts. 

A property is a predicate over the global state of the system. A stable 
property is an invariant: once it becomes true, it continues to be true 1 . The 
most commonly studied examples of stable properties in distributed systems 
are deadlock of a subset of the processes, termination of a distributed com- 
putation, and the lack of a token among the processes. There are, of course, 
other stable properties of interest. For example, in a token-passing system 
that can lose but not generate tokens, the predicate “there are no more than 
two tokens in the system” is a stable property. 

Let <7 t |$ (read a t relative to <J>) be the values of the subset of variables 
of <7\ that are referenced in the formulation of property An event e\ is 
relevant to a property $ if ^ <r\ |$; that is, if e\ changes pf s local 

state relative to $ by changing the value of a variable in the formulation 
of $. For example, if $ is w a subset of the processes are deadlocked” then 
the relevant events are those that request a resource and those that grant 
a resource, since $ is formulated in terms of resource requests and grants. 
Note that local events, send events, and/or receive events can be relevant, 
depending on how $ is formulated. 

2.2 Vector Clocks 

Our protocol is based on a variant of vector clocks [Mat89]. In the usual 
definition of a vector clock V, each event e x has an n-component vector 
V(e/)[L.7i] associated with it. V(e{) is called the vector timestamp of e x . 
The components of V(e t ) are: 

♦ V(e\)[i] = t ; that is, V r (e-)[i] is the number of events that p x has 
executed up to and including e \ . 

^ome authors define an invariant property to be one that is valid in all states of the 
system. 


4 


• V(e\)[j]J ^ i is the number of events p 3 has executed that causally 
precede e\. 

As an example, Figure 1 shows a space— time diagram of a two-process system 
with the events labeled by vector clocks. 


x := 1 u := 1 x := 2 



Figure 1: Execution with vector clocks. 

A simple implementation of vector clocks has process pi maintain an 
n-element vector V x of counters. Process pi increments V{[ i] whenever it 
executes an event e,. If e t - is a local event or a send event, then V r (e,) = 
V{. If e{ is a send event, then pi includes V t in the message. If e 3 is the 
corresponding receive event, then process p 3 sets VA; : k ^ j : V 3 [k] to the 
maximum of the previous value of V 3 [k ] and the value of Vi[k] in the message, 
and V{€j) = Vy 

The following three relations hold between vector clocks and global 
states, where — ► is the happened-before relation defined in [Lam78]. Equa- 
tion 1 defines the happened-before relation in terms of vector clocks, Equa- 
tion 2 defines when two events are consistent with each other (we call two 
such events pairwise consistent ), and Equation 3 defines when a set of local 
states {<7i, . . . , <r n } produced by events {ei,...,e n } comprise a (consistent) 
global state: 

'ii,j:i^j:V(e,)[i}<V(e J )[i] = e, — e } (1) 
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(F(e,)W > V{e 3 )[i\) A 

(V(ej)\j]> V(ei)[j]) = ei and e ; are pairwise consistent (2) 
Vi, j: F(e,)[z] > T/(e_,)[i] = {cr t ,. . is a global state (3) 

Equation 2 can be derived by noting that two events e\ and e“ are incon- 
sistent only if (without loss of generality) e\ — e“ A 3ef : e\ -> ef -♦ e“, 
or in terras of vector clocks, V"(e‘)[i] < V(e“)[i], Equation 3 can be derived 
from Equation 2 by noting that all events in a consistent cut are pairwise 
consistent. Observe that £ = (<7i, . . . , cr„} is a global state if and only if 
C — {ej, . . . , e n } is a consistent cut. 

When vector clocks are used in actual protocols, not all events cause a 
process’ vector clock to be updated. For example, some causal broadcast 
protocols are based on vector clocks that are updated only at send or broad- 
cast events; other events (i.e., receive and local events) do not increment the 
local component of V [Pet87,BSS90]. For our purposes, only the execution 
of relevant events update the local counter of a vector clock. This is because 
the execution of a nonrelevant event does not change the state of the system 
with respect to $. 

If not all send events are relevant events, however, then Equations 2 
and 3 need not hold. For example, suppose e\ and e“ are relevant events 
that are pairwise inconsistent: e\ — ► e“ A 3e\ : ej — e\ — * e“. If no such 
ef is a relevant event, then V(e-)[z] = and V(e“)(j] > V(e|)[j], 

which satisfies the left side of Equation 2 even though e\ and e“ are pairwise 
inconsistent. On the other hand, = erf |$, and so the fact that e\ and 
e“ are pairwise inconsistent is irrelevant with respect to $, as long as e\ and 
e'j are pairwise consistent. 

We therefore define a type of vector clock for which a weaker version of 
Equation 2 holds. Let the weak vector clock V# for $ be the vector clock 
in which V$(gj)[»] counts only the events relevant to $ that p x has executed 
through e,. Therefore, the vector timestamp associated with several events 
of p, may have the same value, but ail such events result in the same local 
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state relative to $. 2 For example, in the case of deadlock the relevant 
events are sending a request for a resource, sending a grant of a resource, 
and receiving a grant of a resource (see Section 5.2). If a process p requests 
a resource and then sends an unrelated message, then the send event does 
not change the local state of p with respect to possible deadlock. So, the 
send event is given the same weak vector timestamp as the resource request 
event. 

We say that two events e- and ef are equivalent with respect to written 
e\ ef, if e\ and ef have the same weak vector timestamp. We say 
that two local states erf and erf are equivalent with respect to written 
o\ of, if e\ ef. Similarly, two global states £ = {of , . . . , of 1 } and 
£' = {crj 1 ,. . .,crn n } are equivalent with respect to $, written £ £', if for 

all i, of ~<t of. The following versions of Equations 1, 2, and 3 hold for 
both vector clocks and weak vect or c locks: 

3 ef,ef:(ef e‘) A (ef ~<* e“): 

e i - ^ 

3 ef,ef:(ef e\)A(ef e“): 

ef and e j' are pairwise consistent (5) 
3 global state E': 

E' <r»} (6) 

Figure 2 shows weak vector clock values for the execution shown in Fig- 
ure 1, where we assume that the predicate of interest references x and y , 
but not u nor any of the channel states. Note that although the events 
x := 1 and y := 3 do not form a consistent cut, their timestamps satisfy 
Equation 6 since there exist several cuts equivalent to this inconsistent cut 
(all necessarily having (x = 1, y = 3)) and they are therefore consistent with 

2 Note that two events of different processes may have the same weak vector timestamp 
as well. 


V*,j: i # j: V*(e?)(i] < V*(«V)[i] = 

(V*(e|)(i] > V^)[i}) A 
(V*(e')[j) > V*(eJ)[j]) = 

Vi, j : V*(e,- )[«] > V*(e i )[»I = 



respect to $. 


x := 1 u := 1 x := 2 



Figure 2: Execution with weak vector clocks. 


2.3 Locally Stable Properties 

Our protocol will detect a subset of the stable properties that we call locally 
stable properties. Informally, a stable property $ is locally stable if no pro- 
cess involved in the property can change its state relative to $ an unbounded 
number of times once $ holds. For example, suppose $ is ’"processes p, and 
Pj are deadlocked.” The property $ is locally stable because once $ be- 
comes true, neither pi nor p v the processes involved in can execute any 
event that could affect $ (e.g., requesting or granting a resource). Hence, 
and a 3 remain constant once holds. 

More formally, let Q be the set of all global states that the system can 
attain. For any E 6 Q, define E|$ to be the subset of S that is referenced 
in the formulation of and given a set of processes A define to be 
the subset of E that consists of the states of the processes in A, We will 
call $ locally stable if it is stable and if it satisfies the following condition: 
consider any E € Q that satisfies and let A be the set of processes p x such 
that < 7 ^$ does not change an unbounded number of times in any execution 
starting at E. Then, for all E * € G such that E^|$ = E^|$, $ holds in E'. 
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In other words, § can be determined from only the states of the processes 
in A. Note that A can be empty, but only for trivial stable properties; if A 
is empty, then $ can be determined without knowledge of the state of any 
process or channel and must therefore be valid or not valid in all states. For 
this reason, we will assume in this paper that A is nonempty. 

The most commonly studied stable properties — deadlock, termination, 
and lack of a token — are all locally stable. For example, if S is a deadlock 
state, then .4 includes the deadlocked processes, and so the presence of 
deadlock can be determined by considering only the states of the processes 
in A. An example of a stable property that is not locally stable is the 
property “there are no more than k : k > 0 tokens” in a system where 
tokens cannot be created but can be lost when passed. This is because if 
S is a state in which there are k tokens, then every process can execute a 
relevant event an unbounded number of times (namely, it can pass tokens), 
thereby changing its local state relative to $ an unbounded number of times, 
and so A is empty. The property, however, is not valid in all states of the 
token passing system. 3 

For most locally stable properties of interest, the processes in .4 cannot 
change their local states relative to $ at all once $ holds (i.e., the bound 
on the number of future relevant events that they can execute is zero). In 
this case, if our protocol presented below is initiated in a state in which $ 
holds, it is guaranteed to detect $. For locally stable properties for which 
the bound is not zero, however, the protocol may not detect $ if initiated 
in a state in which $ holds. However, the system will eventually reach a 
state in which all processes in .4 will execute no further relevant events. If 
the protocol is initiated in such a state, it is guaranteed to detect $. Thus, 

3 A need not be empty for a stable property to be not locally stable. For example, 
suppose $ is again “there are no more_than k : k > 0 tokens” and the token passing 
system consists of red and green processors. Furthermore, only red processors can lose 
tokens and a green processor never passes a token to a red processor. In this system, A 
is the set of green processors (green processors never execute a relevant event), yet the 
validity of $ depends on the states of both the green and red processors. Hence, $ is not 
locally stable. 
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though the protocol eventually detects $ in all cases, it detects $ “sooner” 
when the bound is zero. 

3 Protocol 

3.1 Basic Protocol 

We first assume that a process po will determine whether the global state of 
the processes P = {p \, . . . ,p n } satisfies a locally stable property <£. Later, 
we will change this protocol so that any number of processes in P may 
concurrently assume the role of po- 

Our protocol is based on the notion of a consistent subcut — a set of 
events whose timestamps satisfy Equation 6. (The state of a single process 
is trivially a consistent subcut.) Informally, the protocol works as follows. 
Whenever a process p t executes a relevant event e,, p* records in a buffer B x 
its local state relative to $ and the vector time stamp V$(e t ) as B x . a and 
B{.V , respectively. Process po periodically collects the values of the buffers 
in any order, yielding a set B — {i?i , B?, B n }. Once po has constructed 
this set, po determines if there exists a maximal consistent subcut of B such 
that the states associated with the timestamps in the subcut satisfy If 
Po can find such a subcut, then $ must currently hold. Note that po need 
not examine all consistent subcuts; if A* C A and $ holds in £<$)/!', then 
$ will also hold in so we need examine only the maximal consistent 

subcuts of B. Of course, $ may be of the form Vp t - : $(p t ), in which case 
only a full consistent cut will satisfy 

Unfortunately, the number of maximal subcuts of a set of n weak vector 
clocks is fi(2 n ). Fortunately, it is not necessary for po to examine all of 
these subcuts. Suppose the set of buffer values contains B x and B 3 that are 
inconsistent: Bi.V[i] < Bj.V[i]. These two states violate Equation 6, and 
so both cannot be part of the same consistent subcut. However, B x .V[i] < 
Bj.V[i] implies that p,- executed a relevant event between the time that 
Bi.a was recorded and the time that B r o was recorded. Therefore, p 0 need 
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• Each process p,- 6 P records cr, and V$(e;) in buffer B, upon executing 
a relevant event e,. 

• Periodically, po collects all of the buffers B, and extracts from them 
the latest subcut {<x, : Vj : Bj.V[i ] < 5,-.V[t]}. 

• po detects $ if $ holds on the latest subcut. 


Figure 3: Basic Protocol 

not consider subcuts containing B s .a: if the system is in a state such that 
the processes involved in $ will execute no more relevant events, then B t .a 
cannot be necessary for the detection of $ and so need not be considered. 
Otherwise, the system will eventually reach such a state. If B,.cr is involved 
in determining $, then B x will be recorded such that B{.V[i] > Bj.V[i] 
for all j. Thus, given a set of buffered values B and the partial order 
VB,, Bj e B : Bi > Bj d = B,.V[j] > Bj.V[j ], po need only find the greatest 
elements of B with respect which can be done in Q(n 2 ) time. 4 We call 
this subcut the latest subcut of B. The latest subcut is clearly a maximal 
subcut, since all states that are not part of the latest subcut are inconsistent 
with some state in the latest subcut. This gives us the protocol shown in 
Figure 3. 

The soundness of this protocol is straightforward. We now argue that the 
protocol is complete as well; that is, if $ holds, then our protocol will detect 
<$. Let E be the first global state in which $ holds. Since $ is locally stable, 
there is a nonempty set of processes A each of which executes a bounded 
number of relevant events after E; these processes will not change their states 
relative to $ nor update their vector clocks an unbounded number of times 

‘The greatest elements of >- can be found by discarding any values B, such that 
3i : B,.V\j] > Bj.V[j], which can be done in 0(n 2 ) time using a straightforward algorithm. 
And, if all values are incomparable then all the values are greatest elements of >-. To 
determine that they are all incomparable takes n 2 comparisons, and so the problem is 
Q(n 2 ). 
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in any run starting at E. Suppose that po initiates the protocol in or after 
E (i.e., when $ holds). Because the processes in A can each execute only 
a bounded number of relevant events after E, and because message delivery 
time is finite, there is some global state E' reachable in finite time from E 
after which the processes in A execute no more relevant events. Therefore po 
will eventually collect the states L$\A. From the definition of y, the state 
of a process p, in A must be in any latest subcut constructed by po because 
Pi will execute no more relevant events. Since $ is stable, E' satisfies and 
since $ is locally stable, $ can be detected by examining Hence. p 0 

will detect 

3.2 Decentralization 

In the above protocol, po’s role is to collect the local states, determine the 
latest subcut, and check if $ holds in this subcut. We can decentralize these 
steps by collecting the local states in a token. 

Consider a token K that consists of n entries (D \, . . . , D n ) where each 
entry D x = (Z?,.a, that is, D x will hold the state of p t relevant to $ 

and the local component of p,’s vector clock when it generated B t .a. Assume 
that there exists a special value 1 for D x indicating that the state is not in 
the token; all of the D x in K are initially set to 1. 

To determine whether $ holds, a process generates a token A\ inserts 
its state and vector clock value into K y and passes the token to any other 
process. When a process p 3 receives a token K , it takes the following steps; 

L Set Dj to {B r (j, BjV[j]). 

2. For all non-± values of Dk that are not in the latest subcut, set Dk to 
1. 

3. Determine whether the state values in K satisfy If so, then the 
detection is made; otherwise, p 3 forwards the token to a process p^, 
chosen fairly, that has Dk — X. If there is no such process, then pj 
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can either drop the token or, when p 3 computes a new value of Bj, pj 
can restart at Step 1 with this token. 

Note that when process pj executes rule 2, B 3 must be part of the latest 
subcut; if it were not, then there would exist a recorded value of B( in Di 
such that Bj.V\j] < Bt.V[j\. This implies that pt knows of a relevant event 
executed by pj that results in a state causally after the state recorded in B 3 , 
which violates the definition of B Jt Thus, only the earlier values D* need be 
tested with respect to B 3 . From above, the value Bk in Dk can be discarded 
if Bk-V[k\ < Bj.V[k], The value Bk-V[k] is stored in Dk-V, so K carries 
enough information for p 3 to make this test. 

The resulting protocol is summarized in Figure 4. Note that we have 
no a priori restriction on how many tokens there can be in the system at 
any time or on the order in which the token is passed, other than that it is 
passed in a fair manner. These decisions can be made when the protocol is 
applied to a particular problem. 

If this protocol is initiated in a state in which $ holds and after which 
no process executes a relevant event, then $ will be detected with no more 
than n token passes. However, if processes do execute relevant events after 
the protocol is initiated, then the initial detection may not be successful and 
the protocol must be restarted. If the number of relevant events that can be 
executed after $ holds is bounded by A, then detection can take up to an 
additional An token passes. For large A, our protocol could perform worse 
than a snapshot protocol. In practice, however, we do not expect A to be 
large. 

4 Termination Detection 

We now instantiate the general protocol given above to obtain a protocol 
that detects termination in a distributed system. There are many vari- 
ations of this property; the earliest that we know of is due to Dijkstra 
and Scholten [DS80]. The following definition is the same as that given 
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• Each process pi € P records a , and V$(e.) in buffer Bi upon executing 
a relevant event e, . 

• When pi wants to detect $, Pi generates a token {D x := ±, . . . , D n 
J_), sets D{ to (Bi.a, Bi-V[i]), and forwards the token to any other 
process. 

When pj receives a token: 

• pj sets D } to (Bj.a, Bj.V[j]). 

• For each Dk : k ^ j, Dk ^ 1, p : sets Dk J- if D^.V - Bk-V[k] < 
B r V[k}. 

• pj determines if $ holds on the state values in the token. If not, p : 
forwards token to any pt such that Dk = -L. 


Figure 4: Decentralized Protocol 


in [Mis$3]. 

All processes are either active or idle. Only active processes can send 
messages. An active process may become idle at any time, and an idle 
process can become active only upon receipt of a message. The system 
is terminated when ail processes in the system are idle and there are no 
messages in transit. 

The local state of a process relative to termination consists of whether 
the process is active or idle and whether there is a message on an incoming 
channel. Therefore, the events that are relevant to termination are sending 
a message, receiving a message, becoming idle, and becoming active. Each 
process will update its (weak) vector clock upon executing any of these 
events. Note that for this problem, we do not need to keep track of the 
contents of the messages exchanged between processes; only the number of 
messages is important. To capture the channel states, we have each process 
keep track of how many messages it has sent and received on each adjacent 
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channel. The combined information of all of the processes will then yield 
the number of messages in transit on each channel: if p; has sent more 
messages to pj than pj has received from p*, then there is at least one 
message on channel Cij. In this way, we can represent the relevant channel 
states without recording an unbounded number of messages. 

We instantiate the general protocol given in Section 3.2 as follows. 

Each process p x maintains the following local state variables: 

• active t : Boolean = true if and only if p x is active. 

• seru£[l..n]: Integer array. sendi[j] = the number of messages that p t 
has sent to p ; . All are initially 0. 

• recv % [l..n}: Integer array. recvi[j] = the number of messages that p t 
has received from pj. All are initially 0. 

When pi sends a message to Py send^j] is incremented. When p t receives 
a message from pj , reci^j] is incremented. When p t becomes active or idle, 
active t is set appropriately. 

At some point, an idle process will start the detection protocol by cir- 
culating a token as described in Section 3.2. The termination condition can 
only be evaluated over a total global state (as opposed to a consistent proper 
subset of the process states), so a positive determination can be made only 
by the process p/ that is the last to add its state to the token. 

Process pj detects termination if and only if the following three condi- 
tions hold: 

1. The timestamps in the token form a consistent cut; 

2. All processes are idle: Vi : active , = false; 

3. There are no messages in transit: Vi, j : send{[j] = recvj[i]. 

The following theorem and corollary show that item 1 is redundant. The 
theorem assumes for simplicity that the buffered states are not collected in 
a token; the corollary removes this assumption. 
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Theorem 1 Let B = { B , : i = 1, 2, . . .n} be a a set of buffered state values 
that were recorded in a system that does not collect states in a token D. If in 
B^i,f: send,[j] = recvj[z], then the global state defined by B is consistent: 
Vi,j'- > B,.V{x\. 

Proof: Suppose by way of contradiction that item 3 holds over B but the 
timestamps in B form an inconsistent cut: Bi-V[i] < B r V[i\. B y V[i } 

is advanced only when p ; receives a message, and events local to p } affect 
only Bj.V[j}. Therefore, in order for Bj.V[i] to advance beyond the recorded 
Bi.V[i\, there must have been a chain of messages between p, and p : between 
the time that B , was collected and the time that B } was collected. This 
implies that there is some k such that the recorded sen«4[fc] < reci’k[z], 
contradicting the assumption that item 3 holds. 

□ 

Corollary 2 Let B = {Bi : i = 1,2, . . .n} be a a set of buffered state values 
that were collected in a token D. If in D,Vi,j: send,[;] = recv,[t], then the 
global state defined by D is consistent: 'ii.j: D,.V'[i] > £>j.V'[i]. 

Proof : None of the events executed in collecting the buffered states into a 
token are relevant. Hence, collecting the states in this way has no effect 
on their consistency with respect to termination. The buffered states will 
therefore be consistent with respect to termination when items 2 and 3 hold. 

O 

Corollary 2 implies that the vector clocks need not be maintained. Fur- 
thermore, these checks can be done incrementally. For example, we can 
assign a total order to the processes and have the token passed along that 
total order. When process pk receives the token, it tests to see if 

-< active k A (W: 1 < i < k : ( sendk[l ] = recu/[fc]) A (send([A:] = rect’ fc [/])). 

If this condition does not hold, then p* can drop the token. If the 
condition holds and k = n, then termination is detected; otherwise, pk fills 
in Dk and passes the token to Pk+i- 
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This yields the protocol given in [Mat87] as the channel counting pro- 
tocol, which requires only n messages to detect termination once it holds, 
and which can be further refined into a protocol that is space-efficient. This 
is a good example of how our general protocol, which constructs consistent 
(sub)cuts explicitly, can be used to derive a much simpler protocol that 
constructs consistent cuts implicitly. 

5 Deadlock Detection 
5.1 A>out-of-m Deadlock 

We now instantiate the general protocol given in Section 3 to obtain a proto- 
col that detects k-out-of-m deadlock in a distributed system, This problem 
was first formulated and solved in [BT84], In this formulation, a process 
can request k resources from a pool of m resources. 

A process is either active or blocked, where an active process is one 
that is not waiting for any other process. Active processes may issue k- 
out-of-m requests in the following way. When an active process p x requires 
k processes to carry out some request, it sends request messages to each 
of the m processes that can perform this action. Process p t then becomes 
blocked, and waits until the action requested is carried out by at least k of 
the m processes. A process can not send any further requests while blocked, 
but a process can receive request messages while blocked. 

Only active processes can carry out a requested action. If a process 
receives a request while active, it will either become blocked or carry out 
p^s requested action within finite time. In the latter case, p ; will send a 
grant message to p x . When p x receives k grant messages, it becomes active 
again. It then relinquishes the requests made to the rest of the processes to 
which it sent request messages by sending them relinquish messages. We 
assume that the recipient of a relinquish message acknowledges the message 
and that the sender of a relinquish message waits for all acknowledgements 
before sending another request message. By doing so, we guarantee that 
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Pi can discard any grant messages received after the first k are received. 

The state of a process p t relative to &-out-of-ra deadlock consists of 
the number of grants needed for pi to become active and the current set 
of processes that p, is waiting for. We capture this state by having each 
process keep track of the processes on which it is blocked and the number 
of grant messages that it has sent and received on each adjacent channel. 

We instantiate the general protocol given in Section 3.2 as follows. Each 
process p t maintains the following local state variables: 

• kii Integer = the number of grant messages required for p, to become 
active (initially 0). 

• g^endi[l..n]: Integer array. gsendi[j] is the number of grant mes- 
sages that pi has sent to p 3 (all are initially 0). 

• p_recr,[L.n]: Integer array. g^recv % [j] is the number of grant messages 
that pi has received from pj (all are initially 0)^ 

• wf t : Integer set. These are the processes that p 1 is waiting for. When 
p x sends a request message to p ]} wf t := wf x U {;}; when p x receives 
a grant message from p } or sends a relinquish message to p v wfc : = 

- O')- 

Deadlock is determined by constructing and reducing the system waits- 
for graph. This graph is constructed as follows: 

• a waits-for edge is drawn from p, to p_, if wf, 3 j A {g. send^i] = 
p.recv,[j])). That is, p, is waiting for a resource from p ; and no grant 
message is in transit from pj to p,. 

• the number of grants k, needed for p, to be unblocked is k , - |Vj : 
g.sendj[i] — p.recuj[j]|. That is, k, is the number of grants that p, is 
waiting for less the number of grants in transit to p t . 

Deadlock is tested by reducing this graph as follows: if an edge points from 
p, to p ; and pj is active, then the edge can be erased and k, can be reduced 
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by one; and if a process has K{ = 0, then all of its outgoing edges can be 
erased. The system is deadlocked if and only if there are edges that cannot 
be removed by following these two rules. 

In this system, the relevant events of p, are those that change wf t , hi, 
g.sendi and g.recvi. Hence, the relevant events are requesting a resource, 
sending a grant message, receiving a grant message and sending a re- 
linquish message. We can now argue that fc-out-of-m deadlock is locally 
stable: a deadlocked process can execute only a bounded number of relevant 
events (namely, it can receive up to k, - 1 grant messages), and any valid 
global state that contains the local states of the deadlocked processes still 
yields an irreducible waits-for graph. 

The deadlock detection protocol is as follows. When a process p, wishes 
to test for deadlock, p, generates a token, fills £>, with {{wf„ g.sendi , g.recv 
fcj), 5i-U[i]), and forwards the token to some pj / pi . Upon receiving a to- 
ken, a process p ; sets Dj to ((wfj,g.send ] ,g.recvj,k ] ),B r V[j}) and discards 
all values D k that are inconsistent with by setting D jt to _L. pj then 
checks to see if deadlock holds on the remaining values by constructing the 
waits-for graph and reducing it. If deadlock does not hold, then p : fonvards 
the token to any process pjt such that Dk = -L. 

We can improve this protocol by choosing the process to which the token 
is passed more carefully. Since we would like to detect deadlock as quickly 
as possible, the forwarding process should choose a process that is likely to 
add information leading to the detection of a deadlock. A reasonable choice 
is a process pj such that D : = -L and such that p 3 is in wfi for some D , ^ -L. 

The full protocol is presented in Figure 5. We assume that the process 
Pi that generates the token does so because it suspects that it is involved in 
a deadlock; that is, wf i is not empty. 

5.2 RPC Deadlock Detection 

1-out-of-l deadlock is a special case of fc-out-of-m deadlock that lends itself 
to further optimization. This type of deadlock is called RPC deadlock be- 
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when pi receives token (D \, . . . , D n )\ ■ 

begin 

Di.a ki,g.sendi,g.recvi, wfi; 

Di.V := Bi.V[t]; 

for all Dj : D r V < Bi.V[j ] : D , := 1; 
if there exists p, : (Dj = 1) A (3p,- : j € wf) 
then forward token to one of these pj 
else begin 

construct waits-for graph; 
reduce waits-for graph; 

if graph is not fully reduced then signal deadlock 
else drop the token 
end 
end 


Figure 5: Protocol for Detecting A:-out-of-m Deadlock 

cause it can occur in a remote procedure call system, where making a remote 
procedure call is analogous to requesting a resource from a single processor. 
The waits-for graph is constructed as for fc-out-of-m deadlock, except that 
k{ = \wfi\ and thus need not be represented in the wait-for graph. Fur- 
thermore, relinquish messages are not needed and the waits-for graph is 
reducible if and only if it does not contain a cycle. 

We can instantiate our protocol for detecting RPC deadlock as follows. 
As before, the relevant events are requesting a resource (here, making an 
RPC request), sending a grant message (here, sending the reply to the RPC 
request), and receiving a grant message. Any blocked process p, can decide 
to detect deadlock by generating an empty token, inserting its buffered state 
into Di, and passing the token to the (single) process in wf t . When p : receives 
a token from pi, pj will verify that p, waits-for pj and will pass the token 
on to the process blocking p r A process detects deadlock when it receives 
a token that contains a complete cycle. The resulting protocol is shown in 
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RPCDeadlock(p ; ): cobegin 

do forever when {wf 3 ^ 0) and (waited too long) 
create empty token K ; 

K.Dj := (g.sendi[wfj], Bj.V[j]); 
pass K to wfj 

|| do forever when receive token K from pi 
if K.Dj = 1 then 

if u)/j ^ 0 and K.Di.g.$endi[j] = g.recvj[i\ 

and 'iK.Dk : K.D k j* 0: K.D k .V[k] > B r V[k ] 

then 

K.Dj : = ( g.sendi[wfj ], B } .V[j])\ 
pass K to wfj 

else skip /* drop token Ii */ 

else 

if K .Dt.g.sendtlj] = g.recvj[i] then detect deadlock 
else skip /* drop token K */ 

coend 


Figure 6: RPC Deadlock Protocol, Original 
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Figure 6. Note that if the waits-for graph contains a d-cycle, then the token 
need be passed only d times. 

This protocol can be further simplified by applying the following two 
theorems. 

Theorem 3 If (K.D t .g.send,[j] = g.recvj[i ]) then p, has executed no rele- 
vant event since setting K.Di. 

Proof : Assume (K .D,.g.sendi[j] - g.recvj[i}). The last relevant event that 
Pi can have executed before setting K.Di was to send a request message to 
pj. The first relevant event that p, can have executed after setting K.D, is 
the receipt of a grant message from p r Since ( K.D,.g.sendi[j ] = g.recvj[i]), 
Pj has sent no grant messages to p, since p, sent the request to p 3 . Hence, 
Pi can have executed no relevant event since setting K.Di. 

□ 

Theorem 4 If {K.Di.g.sendi[j} = < 7 _recvj[i]) then no process that has set 
its value in K has subsequently executed a relevant event. 

Proof : Let t be the number of values Dk : Dk ^ -L, and assume that 
(K.Di.g.send,[j] = < 7 -necv ; [t]). We will use induction on l. 

Base case ( l - 1). Follows directly from Theorem 3. 

Induction case (I > 1). By the induction hypothesis, no process prior to 
Pi had executed a relevant event when p, received K. No process prior to 
Pi can execute a relevant event until pi does, and by Theorem 3 pi has not 
executed a relevant event since forwarding K to p : . 

a 

Theorem 4 implies 

(g.sendi[j] = g.recv } [i\) => (' tik : Dk ^ 0: Dk-V = fljt-V'[Ar] > B } .V[k]). 

Thus, the vector clocks can be omitted and the token need only carry the 
identity of the process that initiated the test for deadlock. The resulting 
protocol, shown in Figure 7, first appeared in [CMH83] specialized for m = 1. 
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RPCDeadlock(pj): cobegin 

do forever when (ur/j ^ 0) and (waited too long) 
send (g.sendjlwfj], j) to wf : 

|| do forever when receive (s, k) from p,- 
if k ^ j then 

if wfj ^ 0 and 5 = g.recvj[i] 
then send (gsend^wf^k) to wf } 

else 

if s = g.recvj[i\ then detect deadlock 

coend 


Figure 7: RPC Deadlock Protocol, Refined 

The protocol in Figure 6 can be easily generalized to detect detect and- 
deadlock (m-out-of-m requests), since a cycle in the waits-for graph is equiv- 
alent to deadlock in this case as well. The only change necessary is that when 
Pj passes the token, it must replicate the token and pass a copy to each pro- 
cess in wfy With and-deadlock, however, a process can execute a relevant 
event while deadlocked — a deadlocked process can receive a proper subset 
of the required grant messages. Thus, if the waits-for graph contains a re- 
cycle, then even if tokens are generated by a deadlocked process and passed 
along cycles, such tokens may be dropped up to n — d times before the dead- 
lock is detected. However, the protocol in Figure 7 can be effectively run 
in parallel by having pj send (j.sendjfu], k) to all the processes u 6 wfj in 
which case a token passed along a cycle will not be dropped. The resulting 
deadlock detection protocol is the one presented in [CMH83]. 

6 Conclusion 

This paper defined a proper subclass of the stable properties which we denote 
the locally stable properties. This subclass is interesting in that a process 
that is “involved” in establishing the stable property is limited in what it 
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can do, and will eventually cease changing its local state with respect to 
the stable property. Hence, in order to detect a locally stable property, a 
consistent cut need not be explicitly constructed — the relevant local states 
will form a consistent subcut implicitly. This leaves only the problem of 
detection. 

In order to make this observation, we needed to define consistent cuts 
with respect to a global state predicate, and slightly extend the notion of 
vector clocks to accommodate our definition. We then gave a simple and 
decentralized protocol that detects when a locally stable property occurs in 
a distributed system. The protocol can be easily refined, which we illustrate 
by refining it to a known protocol for termination detection, a new proto- 
col for &-out-of-m deadlock detection, and known protocols for m-out-of-ra 
deadlock detection. 

In the reductions to the two known protocols, the vector clocks proved 
redundant. This was because the processes involved in <£ could execute 
no relevant events once they established the condition of interest, and the 
detection algorithm also ensured that the channels carried no undelivered 
relevant messages. In both cases, the receipt of a relevant message was the 
only relevant event that a process involved in $ could execute, and so an 
empty channel between two processes implied pairwise consistency of the 
recorded states of those two processes. This observation is similar to one of 
the steps in the refinement of a termination protocol given in [CM86], yet 
we have not been able to refine our protocol to their termination protocol. 

The class of locally stable properties was defined in proving the protocol 
correct. We would like to determine what kinds of properties are locally 
stable. We know of two general classes: the locally stable properties of 
distributed garbage detection, termination, and global deadlock can all be 
expressed as detecting no token in a generalized token passing system, yet 
deadlock of a subset of the processes does not seem to be so expressible. We 
are interested in whether there are other classes of locally stable properties. 

Not all interesting stable properties are locally stable, however. For 
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example, the property “the number of tokens is less than k > 0” in a token 
passing system that can lose but not regenerate tokens is stable but not 
locally stable. We do not know if there are protocols that are more message- 
efficient than snapshot protocols for detecting such properties. 

Our work was motivated by trying to derive message-efficient special- 
purpose detection protocols from a general detection protocol. We have only 
been partially successful. Our protocol is most efficient when no process can 
execute a relevant event after the condition of interest holds. Furthermore, in 
our derivation of the m-out-of-m deadlock detection protocol in Section 5.2, 
our protocol could generate 0(n) extra messages. Hence, we would like to 
better understand the notion of relevant events and weak vector clocks. 
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